-
Notifications
You must be signed in to change notification settings - Fork 23
Implement XXHash Caching and Refactor Core Rclone Utilities #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implement XXHash Caching and Distributed LocksOverviewThis change introduces a persistent XXHash caching mechanism for local files. It implements a new distributed locking system for B2 storage. Core rclone operations and configuration initialization are significantly refactored. Technical Highlights
Impact
|
| SyncStatusDifferent = "*" | ||
| ) | ||
|
|
||
| // InitializeConfig sets up global configuration variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed sync status constants. Clarify if these constants are replaced or if the sync status logic has been removed entirely.
| } | ||
|
|
||
| // Load config from b2m.toml | ||
| // Load config from b2m.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is duplicated.
| // Fetch user details | ||
| fetchUserDetails() | ||
|
|
||
| AppConfig.LocalDBDir = filepath.Join(AppConfig.ProjectRoot, "db", "all_dbs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path db/all_dbs is hardcoded. Consider making all_dbs a configurable constant if it might change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will use in toml file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue has been logged and will be investigated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good plan. Moving all_dbs to the toml file will make it configurable and addresses the concern about hardcoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Using the TOML file for all_dbs will effectively address the hardcoding concern. Please ensure this change is implemented to make the path configurable.
| fetchUserDetails() | ||
|
|
||
| AppConfig.LocalDBDir = filepath.Join(AppConfig.ProjectRoot, "db", "all_dbs") | ||
| AppConfig.LocalVersionDir = filepath.Join(AppConfig.LocalDBDir, ".b2m", "version") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new .b2m directory is introduced. Ensure this change is communicated to users regarding local file structure.
| return nil | ||
| } | ||
|
|
||
| func findProjectRoot() (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The findProjectRoot function now searches for db or go.mod. This is a robust approach, but go.mod might not always indicate the project root for non-Go specific parts.
| } | ||
| parent := filepath.Dir(dir) | ||
| if parent == dir { | ||
| return "", fmt.Errorf("root not found (searched for 'db' dir or 'go.mod')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message root not found is generic. Specify which of the two conditions (db dir or go.mod) failed.
| LogError("DownloadDatabase rclone copy failed for %s: %v", dbName, err) | ||
| return fmt.Errorf("download of %s failed: %w", dbName, err) | ||
| } | ||
| description := "Downloading " + dbName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description parameter for RcloneCopy is hardcoded. Consider making it more dynamic or passing it from the caller for better context in logs.
| // 3.1. Calculate Local Hash of the newly downloaded file | ||
| localDBPath := filepath.Join(config.AppConfig.LocalDBDir, dbName) | ||
| localHash, err := CalculateSHA256(localDBPath) | ||
| localHash, err := CalculateXXHash(localDBPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from CalculateSHA256 to CalculateXXHash. XXHash is faster but not cryptographically secure. Confirm if this change aligns with the security requirements for integrity checks.
| ) | ||
|
|
||
| // LockDatabase creates a .lock file | ||
| func LockDatabase(ctx context.Context, dbName, owner, host, intent string, force bool) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The force parameter for LockDatabase allows overriding existing locks. This could lead to data corruption if not used carefully.
| filename := fmt.Sprintf("%s.%s.%s.lock", dbName, owner, host) | ||
|
|
||
| // If forcing, we first clean up ALL existing locks for this DB to ensure we start fresh. | ||
| if force { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When force is true, all existing locks for the specific database are cleaned up. This scope (dbName.*.lock) is crucial and prevents accidental deletion of other database locks.
| } | ||
| defer os.Remove(tmpFile) | ||
|
|
||
| // Use RcloneCopy to upload the lock file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FetchLocks followed by RcloneCopy is not an atomic operation. A race condition could occur where another client acquires a lock between the FetchLocks call and the RcloneCopy operation.
| } | ||
|
|
||
| // UnlockDatabase removes the .lock file | ||
| func UnlockDatabase(ctx context.Context, dbName, owner string, force bool) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The force parameter for UnlockDatabase allows deleting all locks for a database. This is powerful and potentially dangerous if misused.
| if force { | ||
| // Use rclone delete with include pattern | ||
| // Pattern: dbName.*.lock | ||
| pattern := fmt.Sprintf("%s.*.lock", dbName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern for rclone delete uses dbName.*.lock. If dbName itself contains wildcards, this could lead to unintended deletions.
|
|
||
| // cachedHash stores the hash and file stat info to avoid re-hashing unchanged files | ||
| type cachedHash struct { | ||
| Hash string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using time.Time directly for ModTime instead of int64 (UnixNano). This improves type safety and readability.
| } | ||
|
|
||
| var ( | ||
| fileHashCache = make(map[string]cachedHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global variables for cache and mutex can lead to tight coupling and make testing difficult.
| func CalculateXXHash(filePath string) (string, error) { | ||
| info, err := os.Stat(filePath) | ||
| if err != nil { | ||
| LogError("CalculateXXHash: Failed to stat file %s: %v", filePath, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LogError call duplicates the error return. The caller will also receive the error.
|
|
||
| if ok && cached.ModTime == info.ModTime().UnixNano() && cached.Size == info.Size() { | ||
| return cached.Hash, nil | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The else keyword is unnecessary here. The if block returns, so the following code is implicitly the else branch.
| if ok && cached.ModTime == info.ModTime().UnixNano() && cached.Size == info.Size() { | ||
| return cached.Hash, nil | ||
| } else { | ||
| LogInfo("Cache miss for %s. Cached: %v, Current: ModTime=%d, Size=%d", filepath.Base(filePath), ok, info.ModTime().UnixNano(), info.Size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging filepath.Base(filePath) might be insufficient for debugging if multiple files with the same base name exist in different directories.
| // Calculate hash | ||
| f, err := os.Open(filePath) | ||
| if err != nil { | ||
| LogError("CalculateXXHash: Failed to open file %s: %v", filePath, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LogError call duplicates the error return. The caller will also receive the error.
| // Use streaming digest | ||
| h := xxh3.New() | ||
| if _, err := io.Copy(h, f); err != nil { | ||
| LogError("CalculateXXHash: io.Copy failed for %s: %v", filePath, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LogError call duplicates the error return. The caller will also receive the error.
| } | ||
|
|
||
| // Sum64 returns uint64, format as hex string for compatibility | ||
| hash := fmt.Sprintf("%016x", h.Sum64()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hash is formatted as a 16-character hexadecimal string. Ensure this format is consistent with how hashes are stored and compared elsewhere in the system.
|
|
||
| data, err := os.ReadFile(cachePath) | ||
| if err != nil { | ||
| LogError("LoadHashCache: Failed to read cache file: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LogError call duplicates the error return. The caller will also receive the error.
| fileHashCacheMu.Lock() | ||
| defer fileHashCacheMu.Unlock() | ||
|
|
||
| if err := json.Unmarshal(data, &fileHashCache); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LogError call duplicates the error return. The caller will also receive the error.
|
|
||
| if err := json.Unmarshal(data, &fileHashCache); err != nil { | ||
| LogError("LoadHashCache: Failed to unmarshal cache: %v", err) | ||
| return fmt.Errorf("failed to unmarshal cache: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning a wrapped error fmt.Errorf("failed to unmarshal cache: %w", err) is good practice.
|
|
||
| // Ensure directory exists | ||
| if err := os.MkdirAll(config.AppConfig.LocalAnchorDir, 0755); err != nil { | ||
| LogError("SaveHashCache: Failed to create directory: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LogError call duplicates the error return. The caller will also receive the error.
| } | ||
|
|
||
| fileHashCacheMu.RLock() | ||
| data, err := json.MarshalIndent(fileHashCache, "", " ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using json.MarshalIndent for cache files can lead to larger file sizes and slower read/write times for very large caches.
| fileHashCacheMu.RUnlock() | ||
|
|
||
| if err != nil { | ||
| LogError("SaveHashCache: Failed to marshal cache: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LogError call duplicates the error return. The caller will also receive the error.
| } | ||
|
|
||
| if err := os.WriteFile(cachePath, data, 0644); err != nil { | ||
| LogError("SaveHashCache: Failed to write file: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LogError call duplicates the error return. The caller will also receive the error.
| logPath := filepath.Join(appConfigDir, "b2m.log") | ||
| // User requested logs in current directory | ||
| // logPath := "b2m.log" | ||
| logPath := "b2m.log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placing the log file directly in the current working directory might cause issues if the application is run from a read-only location or if multiple instances are run from different directories.
| if err := cmd.Run(); err != nil { | ||
| LogError("DownloadAndLoadMetadata: rclone sync failed: %v", err) | ||
| // Use RcloneSync helper | ||
| if err := RcloneSync(config.AppConfig.VersionDir, config.AppConfig.LocalVersionDir); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RcloneSync helper should handle the 'remote metadata directory not found' case gracefully, possibly returning nil or an empty map if the remote is truly empty.
| return result, nil | ||
| } | ||
|
|
||
| // FetchSingleRemoteMetadata downloads and parses the metadata for a specific DB from the remote version dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new function FetchSingleRemoteMetadata is a good addition for targeted metadata retrieval, avoiding full syncs.
| func FetchSingleRemoteMetadata(ctx context.Context, dbName string) (*model.Metadata, error) { | ||
| // Get file ID (db name without .db extension). Use simple string trimming; | ||
| // if dbName doesn't end in .db, fileID will equal dbName. | ||
| fileID := strings.TrimSuffix(dbName, ".db") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TrimSuffix logic is fragile. If dbName is mydb.db.backup, fileID becomes mydb.db.backup. Consider a more robust way to derive fileID if it needs to be a specific identifier.
| } | ||
|
|
||
| // Use RcloneCopy (it takes a file source and a directory destination usually, but let's check RcloneCopy impl) | ||
| // RcloneCopy signature: func RcloneCopy(ctx context.Context, src, dst, description string, quiet bool, onProgress func(model.RcloneProgress)) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment about RcloneCopy signature is helpful but should be removed or updated once the RcloneCopy implementation is finalized.
| // "Copy the source to the destination. Doesn't transfer unchanged files." | ||
|
|
||
| // Let's use RcloneCopy with quiet=true. | ||
| if err := RcloneCopy(ctx, "copy", remotePath, localDir, "Fetching metadata", true, nil); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error message failed to fetch remote metadata is generic. Include dbName for better context.
| data, err := os.ReadFile(localFile) | ||
| if err != nil { | ||
| if os.IsNotExist(err) { | ||
| return nil, nil // Metadata doesn't exist remotely yet (New DB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning nil, nil if metadata doesn't exist remotely is a valid strategy for new DBs, but ensure callers handle this nil metadata case correctly.
| continue | ||
| } | ||
|
|
||
| // Update local anchor (local-version) to match the new metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a critical addition. After uploading new metadata, the local anchor (local-version) must be updated to reflect the new state.
| } | ||
|
|
||
| // Update local anchor (local-version) to match the new metadata | ||
| // This ensures we don't show "DB Outdated" immediately after generation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment explains the 'why' for updating the local anchor, which is good.
|
|
||
| // Update local anchor (local-version) to match the new metadata | ||
| // This ensures we don't show "DB Outdated" immediately after generation | ||
| if err := UpdateLocalVersion(dbName, *meta); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling for UpdateLocalVersion is marked as 'Non-critical but annoying'. Consider if a failure to update the local anchor should halt the batch process or be logged more prominently.
| rcloneArgs := []string{"copy", | ||
| localPath, | ||
| config.AppConfig.RootBucket, | ||
| // RcloneCopy copies source to destination using rclone copy/copyto with options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new RcloneCopy function generalizes rclone copy operations, improving reusability.
Description
Fixes # (issue)
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.
Screenshots (if applicable)
Add screenshots to help explain your changes.
Checklist
Additional Notes
Add any other context about the pull request here.